Math Arena AI

lean

LLM

Published

April 26, 2025

I was doing some research for a talk and came across this website for recording AI results on math problems: https://matharena.ai/

LLMs have been getting better and better at solving math problems but there is a worry that all they do is regurgitate what they already know. So, if an AI has seen a problem before you can expect it to solve it but if it hasn’t then all bets are off. Many of the progress claims are in fact reporting these memorized solutions.

The above website keeps a record of how the AI is doing on new contests. There are human judges that look at the solution much the same way that you would judge a student’s solution in a math contest.

The results, as of today, seem to suggest that some AIs are good at solving new problems of a lower difficulty level but struggle with harder problems. This is still quite incredible, and much better than just an year ago, but clearly any exemplary claims are exaggerated.